Towards Transient Fault Tolerance for Heterogeneous Computing Platforms

نویسندگان

  • Nishant George
  • John Lach
  • Sudhanva Gurumurthi
  • Charles L. Brown
چکیده

The computing demands of applications coupled with the power wall problem in modern processors are expected to pave the way for heterogeneous computing platforms that are composed of a variety of processors and hardware accelerators. While current heterogeneous platform design analyses assess area, performance, and power, the tremendous increase in transient fault rates requires that reliability analyses also be included, especially since fault protection mechanisms can directly affect the aforementioned area, performance, and power analyses – and they affect these metrics differently when implemented on different processing components. Heterogeneous platform design therefore requires accurate characterization of fault protection mechanisms when used in different processing components. This work-inprogress report details the first step in this direction, providing a characterization of various transient fault protection mechanisms in ASICs and FPGAs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the palbimm scheduling algorithm for fault tolerance in cloud computing

Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...

متن کامل

Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid

Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...

متن کامل

HeteroPar'2010: Eighth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms

Networks of computers are now the most common and available parallel architecture. Unlike dedicated parallel computer systems, networks are inherently heterogeneous. They consist of diverse computers of different performance interconnected via heterogeneous network equipment providing communication links with different latencies and bandwidths. Traditional parallel algorithms and tools are aime...

متن کامل

Fault Tolerant Deflecting Router with High Fault Coverage for On-chip Network

Continuous scaling of CMOS technology makes it possible to integrate a large number of heterogeneous devices that need to communicate efficiently on a single chip. For this efficient routers are needed to takes place communication between these devices. As the chip scales, the probability of both permanent and transient faults is also increasing, making Fault Tolerance (FT) a key concern in sca...

متن کامل

Design and Analysis of Transient Fault Tolerance for Multi Core Architecture

This paper describes the software approach of fault tolerance for shared memory multi core system using PLR.PLR uses a software-centric approach transient fault tolerance which ensuring a correct software execution. This scheme is used at user space level which does not necessitate changes to the original application.PLR create a set of redundant process per application process. In this scheme ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008